CpG Island Finding Using Graphical Models

نویسندگان

  • Gang Ji
  • Tim Ng
  • Lingyun Huang
چکیده

CpG islands are short stretches in DNA sequence whose frequency of cytosine(C)and guanine (G) is higher than background of DNA sequence. They are around the promoter of frequently expressed genes. The conventional way to recognize CpG islands is to use the hidden Markov models (HMMs). While HMMs are known to suffer from not being able to capture long dynamic range information, they usually doesn’t provide satisfying results. In this work, we will try to find CpG islands with improved HMM systems (by means of introducing language model weights) as well as other family of graphical models: dynamic Bayesian networks (DBNs) [11] and conditional random fields (CRFs) [9]. By using different weights to different kinds of links in an HMM, we can get some improvements on the recognition. Significant improvements can be achieved by adding dependencies to the observation variables and thus change the structure of graphical models.. The newly developed gene-trigram model can reduce the equal error rate by 42.3% relatively to baseline system. Unfortunately, even though CRFs show big benefit in tasks like text segmentation and part-of-speech tagging, it didn’t recognized any CpG island in our preliminary experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study of promoter CpG island hypermethylation of cyclindependent kinase inhibitor gene p21waf1/cip1 on some breast carcinoma cell lines

The p21 belongs to the CIP/KIP family of CDK inhibitors involved in cell cycle arrest at specific stages of the cell cycle progression. DNA methylation is the best studied epigenetic mark that have been evidently associated to chromatin condensation, and repression of gene transcription. The CpG island hypermethylation in promoter region of certain genes occurs in cancer cells and affects tumor...

متن کامل

CpG Island Mapping by Epigenome Prediction

CpG islands were originally identified by epigenetic and functional properties, namely, absence of DNA methylation and frequent promoter association. However, this concept was quickly replaced by simple DNA sequence criteria, which allowed for genome-wide annotation of CpG islands in the absence of large-scale epigenetic datasets. Although widely used, the current CpG island criteria incur sign...

متن کامل

Species-specific organization of CpG island promoters at mammalian homologous genes.

An essential issue derived from the sequencing of the human and other genomes is the identification of gene regulatory elements. Using in vivo footprinting and expression analysis, here we show that mouse and human CpG island promoters at homologous genes have a completely different organization in terms of size and binding of transcription factors. Despite these species-specific differences, a...

متن کامل

A nearly exhaustive search for CpG islands on whole chromosomes.

CpG islands are genome subsequences with an unexpectedly high number of CG di-nucleotides. They are typically identified using filtering criteria (e.g., G+C% expected vs. observed CpG ratio and length) and are computed using sliding window methods. Most such studies illusively assume an exhaustive search of CpG islands are achieved on the genome sequence of interest. We devise a Lexis diagram a...

متن کامل

Comparative analysis using K-mer and K-flank patterns provides evidence for CpG island sequence evolution in mammalian genomes

CpG islands are GC-rich regions often located in the 5' end of genes and normally protected from cytosine methylation in mammals. The important role of CpG islands in gene transcription strongly suggests evolutionary conservation in the mammalian genome. However, as CpG dinucleotides are over-represented in CpG islands, comparative CpG island analysis using conventional sequence analysis techni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004